Size-Consistent Statistics for Anomaly Detection in Dynamic Networks
نویسندگان
چکیده
In this paper, we will focus on the task of anomaly detection in a dynamic network where the structure of the network is changing over time. For example, each time step could represent one day’s worth of activity on an e-mail network or communications of a computer network. The goal is then to identify any time steps where the pattern of those communications seems abnormal compared to those of other time steps. We will be approaching this problem as a hypothesis testing task the null hypothesis is that a time step under scrutiny represents normal behavior of the network while the alternative hypothesis is that it is anomalous. The null distribution will be constructed from graph examples observed in the past, and the test statistics will be various network statistics. Whenever the null hypothesis is rejected for a time step, we will flag the tested time step as an anomaly. A typical real-world network experiences many changes in the course of its natural behavior, changes which are not examples of anomalous events. The most common of these is variation in the volume of edges. In the case of an e-mail network where the edges represent messages, the network could be growing in size over time or there could be random variance in the number of messages sent each day. The statistics used to measure the network properties are usually intended to capture some other effect of the network than simply the volume of edges: for example, the common clustering coefficient is a measure of transitivity which is the propensity for triangular interactions in the network. However, statistics such as the clustering coefficient are Statistically Inconsistent as the size of the network changes more or fewer edges/nodes change the output of the statistic even when the transitivity property is constant making graph size a confounding factor. Statistical consistency and inconsistency are described in more detail in Section 6.3. Even on an Erdös-Rényi network, which does not explicitly capture transitive relationships through a network property, the clustering coefficient will be greater as the number of edges in the network increases as more triangles will be closed due to random chance. When statistics vary with the number of edges in the network, it is not valid to compare different network time steps using those statistics unless the number of edges is constant in each time step. The flowchart in Figure 2 outlines the detection approach: unless the statistic is carefully defined to be robust to confounding factors, it is impossible to determine which factor that generated the graph is responsible for detected anomalies. Table I shows a glossary of terms that will be used throughout this Chapter. Some, like the terms Gt, Vt, and Wt, are from the dynamic graph definitions used previously. The other terms will be explained as they are used throughout the Chapter. Figure 1 shows the effect of statistical (in)consistency. During the experiment pairs of graphs were generated using a Chung-Lu generative model (described in Section 6.6) with a certain number of total edges. Subfigure (a) shows the values of a Size Consistent Statistic called Probability Mass Shift (described in Section 6.4) calculated on pairs of graphs, while Subfigure (b) shows the same for the Netsimile statistic (described in a previous Chapter). Each black point shows the average value of 100 gener-
منابع مشابه
Dynamic anomaly detection by using incremental approximate PCA in AODV-based MANETs
Mobile Ad-hoc Networks (MANETs) by contrast of other networks have more vulnerability because of having nature properties such as dynamic topology and no infrastructure. Therefore, a considerable challenge for these networks, is a method expansion that to be able to specify anomalies with high accuracy at network dynamic topology alternation. In this paper, two methods proposed for dynamic anom...
متن کاملAnomaly Detection in Dynamic Networks of Varying Size
ABSTRACT Dynamic networks, also called network streams, are an important data representation that applies to many real-world domains. Many sets of network data such as e-mail networks, social networks, or internet traffic networks are best represented by a dynamic network due to the temporal component of the data. One important application in the domain of dynamic network analysis is anomaly de...
متن کاملAnomaly detection in dynamic networks: a survey
Anomaly detection is an important problem with multiple applications, and thus has been studied for decades in various research domains. In the past decade there has been a growing interest in anomaly detection in data represented as networks, or graphs, largely because of their robust expressiveness and their natural ability to represent complex relationships. Originally, techniques focused on...
متن کاملA Survey of Anomaly Detection Approaches in Internet of Things
Internet of Things is an ever-growing network of heterogeneous and constraint nodes which are connected to each other and the Internet. Security plays an important role in such networks. Experience has proved that encryption and authentication are not enough for the security of networks and an Intrusion Detection System is required to detect and to prevent attacks from malicious nodes. In this ...
متن کاملDynamic Network Evolution: Models, Clustering, Anomaly Detection
Traditionally, research on graph theory focused on studying graphs that are static. However, almost all real networks are dynamic in nature and large in size. Quite recently, research areas for studying the topology, evolution, applications of complex evolving networks and processes occurring in them and governing them attracted attention from researchers. In this work, we review the significan...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1608.00712 شماره
صفحات -
تاریخ انتشار 2016